集成 Algolia
1. 配置 Config.json
Config.json 文件是我们后续使用的 Algolia 爬虫需要读取的配置文件,将其放置在 src 目录下
config.json
{
"index_name": "ned-wiki",
// 这里注意不能填DNS重定向的Url。我是在vercel上部署,所以这是我实际的url
"start_urls": ["https://blog-sample-ivory.vercel.app/"],
// Crawler 遍历 sitemap.xml来爬取
// 定期更新 sitemap.xml and must use English file names, at least prefix
// 插件: https://docusaurus.io/docs/api/plugins/@docusaurus/plugin-sitemap
"sitemap_urls": ["http://nedtextbook.com/sitemap.xml"],
"sitemap_alternate_links": true,
"stop_urls": [],
"selectors": {
"lvl0": "header h1",
"lvl1": "article h1",
"lvl2": "article h2",
"lvl3": "article h3",
"lvl4": "article h4",
"lvl5": "article h5",
"text": "article p"
},
"strip_chars": " .,;:#",
"custom_settings": {
"separatorsToIndex": "_",
"attributesForFaceting": [
"language",
"version",
"type",
"docusaurus_tag",
"lang"
],
"attributesToRetrieve": [
"hierarchy",
"content",
"anchor",
"url",
"url_without_anchor",
"type"
]
}
}
注意start_urls必须使用你网站真实的 url 例如我通过 DNS 的域名是 https://nedtextbook.com, 而我真实的 url 由于部署在 vercel 上是https://blog-sample-ivory.vercel.app/
如果 url 填错,后续爬虫无任何数据Nb hits: 0